The first windowing capabilities appeared in SQL Server 2005 with the introduction of the OVER clause and a set of four ranking functions: ROW_NUMBER, RANK, DENSE_RANK, and NTILE.
In our discussion, the term “window” refers to the scope of visibility
from one row in a result set relative to neighboring rows in the same
result set. By default, OVER produces a single window over the entire result set, but its associated PARTITION BY
clause lets you divide the result set up into multiple groups, each
contained inside their own window. The row sequence within each window
is determined by an associated ORDER BY clause, and based on this sequence, the ranking functions assign an accumulating value to the rows in the window. In addition to the ranking functions, the OVER clause can be used with the traditional aggregate functions SUM, COUNT, MIN, MAX, and AVG. When doing so, you do not specify the GROUP BY
clause that’s normally required with the aggregate functions. Instead,
each row calculates an aggregation based on the window of rows defined
with OVER, optionally grouped using PARTITION BY.
This is certainly useful, because it allows you to obtain aggregations
without being forced to consolidate (and lose) detail rows with a GROUP BY clause. But unfortunately (until now), the aggregate functions could not also use ORDER BY in the OVER clause (as is required when using OVER with the ranking functions), making it impossible to calculate cumulative aggregations at the row level within each window. For example, you could use AVG with OVER (and, optionally PARTITION BY), but without an associated ORDER BY, there is no designated sequence to the rows in each window, making it impossible for SQL Server to compute a running average from one row to the next within the window. Thus, the best that AVG with OVER could do is compute the average for all the rows in the window (independent of row sequence), and then return that value for every row. SQL Server 2012 finally addresses this shortcoming. In the following code samples, you’ll see how OVER/ORDER BY can now be used with all the traditional aggregate functions to provide running calculations within ordered windows. You’ll also learn how to frame windows using the ROWS and RANGE clause, which adjusts the size and scope of the window to enable sliding calculations.
And finally, SQL Server 2012 introduces eight new analytic functions
(covered in the next section) that are designed specifically to work
with ordered (and optionally partitioned) windows using OVER with ORDER BY (and optionally PARTITION BY). NoteThe code in Example 1 creates a table populated with financial transactions from several different accounts. Tangentially, note the use of the new DATEFROMPARTS function (also covered in the next section), which is used to construct a date value from year, month, and day parameters. Example 1. Preparing sample transaction data for querying with window functions. CREATE TABLE TxnData (AcctId int, TxnDate date, Amount decimal)
GO
INSERT INTO TxnData (AcctId, TxnDate, Amount) VALUES
(1, DATEFROMPARTS(2012, 4, 10), 500), -- 5 transactions for acct 1
(1, DATEFROMPARTS(2012, 4, 22), 250),
(1, DATEFROMPARTS(2012, 4, 24), 75),
(1, DATEFROMPARTS(2012, 4, 26), 125),
(1, DATEFROMPARTS(2012, 4, 28), 175),
(2, DATEFROMPARTS(2012, 4, 11), 500), -- 8 transactions for acct 2
(2, DATEFROMPARTS(2012, 4, 15), 50),
(2, DATEFROMPARTS(2012, 4, 22), 5000),
(2, DATEFROMPARTS(2012, 4, 25), 550),
(2, DATEFROMPARTS(2012, 4, 27), 105),
(2, DATEFROMPARTS(2012, 4, 27), 95),
(2, DATEFROMPARTS(2012, 4, 29), 100),
(2, DATEFROMPARTS(2012, 4, 30), 2500),
(3, DATEFROMPARTS(2012, 4, 14), 500), -- 4 transactions for acct 3
(3, DATEFROMPARTS(2012, 4, 15), 600),
(3, DATEFROMPARTS(2012, 4, 22), 25),
(3, DATEFROMPARTS(2012, 4, 23), 125) In SQL Server 2012, an ORDER BY clause may be specified with OVER to produce running aggregations within each window, as Example 2 demonstrates: Example 2. Using OVER with ORDER BY to produce running aggregations. SELECT AcctId, TxnDate, Amount,
RAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
RCnt = COUNT(*) OVER (PARTITION BY AcctId ORDER BY TxnDate),
RMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
RMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate),
RSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate)
FROM TxnData
ORDER BY AcctId, TxnDate AcctId TxnDate Amount RAvg RCnt RMin RMax RSum
------ ---------- ------ ----------- ---- ---- ---- ----
1 2012-02-10 500 500.000000 1 500 500 500
1 2012-02-22 250 375.000000 2 250 500 750
1 2012-02-24 75 275.000000 3 75 500 825
1 2012-02-26 125 237.500000 4 75 500 950
1 2012-02-28 175 225.000000 5 75 500 1125
2 2012-02-11 500 500.000000 1 500 500 500
2 2012-02-15 50 275.000000 2 50 500 550
2 2012-02-22 5000 1850.000000 3 50 5000 5550
: The results of this query are partitioned (windowed) by
account. Within each window, the account’s running averages, counts,
minimum/maximum values, and sums are ordered by transaction date,
showing the chronologically accumulated values for each account. No ROWS clause is specified (we’ll explain how to use the ROWS clause next), so ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
is assumed by default. This yields a window frame size that spans from
the beginning of the partition (the first row of each account) through
the current row. When the account ID changes, the previous window is
“closed” and new calculations start running for a new window over the next account ID. You can also narrow each account’s window by framing it with a ROWS clause in the OVER clause. This enables sliding calculations, as demonstrated in Example 3: Example 3. Using OVER with ORDER BY and PRECEDING to produce sliding aggregations. SELECT AcctId, TxnDate, Amount,
SAvg = AVG(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW),
SCnt = COUNT(*) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
SMin = MIN(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
SMax = MAX(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING),
SSum = SUM(Amount) OVER (PARTITION BY AcctId ORDER BY TxnDate ROWS 2 PRECEDING)
FROM TxnData
ORDER BY AcctId, TxnDate AcctId TxnDate Amount SAvg SCnt SMin SMax SSum
------ ---------- ------ ----------- ---- ---- ---- ----
1 2012-02-10 500 500.000000 1 500 500 500
1 2012-02-22 250 375.000000 2 250 500 750
1 2012-02-24 75 275.000000 3 75 500 825
1 2012-02-26 125 150.000000 3 75 250 450
1 2012-02-28 175 125.000000 3 75 175 375
2 2012-02-11 500 500.000000 1 500 500 500
2 2012-02-15 50 275.000000 2 50 500 550
2 2012-02-22 5000 1850.000000 3 50 5000 5550
: This slightly modified version of the previous query specifies ROWS BETWEEN 2 PRECEDING AND CURRENT ROW in the OVER clause for the RAvg
column, overriding the default window size. Specifically, it frames the
window within each account’s partition to a maximum of three rows: the
current row, the row before it, and one more row before that one. Once
the window expands to three rows, it stops growing and starts sliding
down the subsequent rows until a new partition (the next account) is
encountered. The BETWEEN…AND CURRENT ROW
keywords that specify the upper bound of the window are assumed
default, so to reduce code clutter, the other column definitions specify
just the lower bound of the window with the shorter variation ROWS 2 PRECEDING. Notice
how the window “slides” within each account. For example, the sliding
maximum for account 1 drops from 500 to 250 in the fourth row, because
250 is the largest value in the window of three rows that begins two
rows earlier—and the 500 from the very first row is no longer visible in
that window. Similarly, the sliding
sum for each account is based on the defined window. Thus, the sliding
sum of 375 on the last row of account 1 is the total sum of that row
(175) plus the two preceding rows (75 + 125) only—not the total sum for all transactions in the entire account, as the running sum had calculated. Finally, RANGE can be used instead of ROWS to handle “ties” within a window. Although ROWS treats each row in the window distinctly, RANGE will merge rows containing duplicate ORDER BY values, as demonstrated by Example 4: Example 4. Comparing ROWS and RANGE for calculating window functions. SELECT AcctId, TxnDate, Amount,
SumByRows = SUM(Amount) OVER (ORDER BY TxnDate ROWS UNBOUNDED PRECEDING),
SumByRange = SUM(Amount) OVER (ORDER BY TxnDate RANGE UNBOUNDED PRECEDING)
FROM TxnData
WHERE AcctId = 2
ORDER BY TxnDate AcctId TxnDate Amount SumByRows SumByRange
------ ---------- ------ --------- ----------
2 2012-02-11 500 500 500
2 2012-02-15 50 550 550
2 2012-02-22 5000 5550 5550
2 2012-02-25 550 6100 6100
2 2012-02-27 105 6205 6300
2 2012-02-27 95 6300 6300
2 2012-02-29 100 6400 6400
2 2012-02-30 2500 8900 8900 In this result set, ROWS and RANGE
both return the same values, with the exception of the fifth row.
Because the fifth and sixth rows are both tied for the same date
(2/27/2012), RANGE returns the combined running sum for both rows. The seventh row (for 2/29/2012) breaks the tie, and ROWS “catches up” with RANGE to return running totals for the rest of the window.
|